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ABSTMCT 

Results obtained by the Kuder-Rlahardeon formula (20) adapted 
for use with R-KW seorlng are CDmpired with three other reliability 
formulas. Based on parallel tests administered at the same ilttlng the 
KR (20) eitlmates are compared with altemate-fom correlations and with 
odd-even eorrelations adjusted by the Speaman^Brown prophaey formula. 
Comparlions are also made between Ml (20) estimates and alternate-form 
correlatlens obtained for testa administered after Intervals of six to 
ten monthSi All the results justify the use of the Kuder-Rlchardson 
procedure with tests that show no more than moderate speededness. 
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AN ASSESSMENT OF THE KUDER-RICHAmSON FORMULA (20) RELIABILITY ESTIMATE 

FOR MODERATELY SPEEDED TESTS 



For some measurement specialists there continues to bo doubt as to the 
apprgprlateniis of the Kuder-Richardson fomula (20) and its close relatives for 
estimating reliability unless all the examinees finish the test. This point of 
view raises a question of considerable Importance because in large'-scale testing ^ 
programs it is frequently Impractical to provide aufficient time to satisfy the 
slowest students. Consequently j assigned time limits are likely to represent a 
compromise between ideal power-test conditions and conditions that may introduce 
a moderate factor of ipeededness* The view that has been generally accepted at 
Educational Testing Service is that a test may be regarded as essentially unspeeded 
if at least 80 per cent of the eKamineei reach the last item and if virtually every 
one reaches three-quarters of the items* Some ETS tests do not quite meet both 
conditions. Nevertheless ^ the Kuder^Richardson formulas have been used with a high 
degree of confidence that they provide good estimates of test reliability. It is 
the p.urpoBe of this paper to present evidence that lustlflei that confidence. The 
Scholastic Aptitude Test happens to provide such evidence without any need for 
special testing. The conclusions to be drawn are properly restricted to test material 
similar to that of the SAT, although there is every reason to believe that generalisa- 
tions can be made to other tests with similar speed characteristics* 

KR (20) versus Alternate-ForTn^ Same Administration 

The analysis sample for each new form of the SAT is selected from the 
records of candid ites who took one of the equating sections. Each equating section 
Is a parallel form of one of the operational sections with respect to cbntent, timing, 
and number of items* Listed in Table 1 are data for the two parallel sections ^ A , 
and B, in thirty SAT forms. Sample sizes range from 370 to 2^000. From the per cents* 
who reached three-quarters of the items, it is seen that our first condition for an 
unspeeded test is approximated for all the verbal sectiona and that the mathematical 
sections fall to meet it by about 1 to 4 per cent in general and by as much as 11*6 
per cent in one instance. Instead of the per cent reaching the last Itemi our second 
condition for an unspeeded test, there has been recorded the n™ber of items reached 
by less than 80 per cent of the group. These figures, too, suggest more speed in the 
mathematical scores than in the verbal scoreo. 
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Table 1 



Comparison of Kuder-Richardson Formula (20) Estimates 
with Alternate-Fom Correlations, Same Administration 







Per Cent 
Wio Raached 
Three-quarters 
of Items 


N™ber of 
Items Reached 
by Less Than 


KR (20) 
Reliability 




Teat 


N 


80 Per Gent 
of Group 


Estima te 








A 


B - 


A 


B 


A 


B 





40-Itam Verbal Sections 



i 
I 


A r\ n 


yy . 7 


98.9 


1 

2 


2 


.854 


, o7o 


• 857 


2 


n A A 

900 


99. y 


99.8 


2 


1 


. 826 


* 825 


. 809 




900 




99.4 


1 


0 


.832 


* 850 


. 821 




O AHA 


on ^ 

yy . 7 


99.6 


1 


1 


.828 


, 847 


, 833 




900 


99. 4 


99.6 


3 


1 


.815 


. 844 


. 818 




900 


99* 9 


100.0 


1 


0 


.825 


827 


. 821 




1,995 


99,5 


'99.2 


1 


3 


.849 


. 850 


, 839 




845 


99.8 


99.9 


0 


1 


.844 


.869 


.842 




900 


99.9 


99.6 


0 


3 


. 851 


.861 


.848 




900 


98.6 


99.4 


2 


1 


.808 


.863 


.809 




900 


100,0 


99.8 


3 


2 


.815 


.848 


.815 


12 


495 


99,4 


100.0 


2 


2 


.796 


.848 


.804 




370 


100.0 


100.0 


1 


1 


.828 


,820 


.833 




370 


100.0 


100.0 


2 


2 


.825 


.855 


.832 






25-Item 


Mathematical 


Sections 






15 


865 


96.5 


97.1 


2 


3 


.825 


.812 


.818 


16 


1,885 


96.2 


88.4 


2 


6 


.828 


.791 


.802 


17 


955 


96.8 


94.7 


2 


3 


.781 


.816 


.789 


18 • . t • . • • c , 


900 


98.6 


96.9 


2 


2 


.827 


.807 


.814 


19 


1,995 


98.3 


96.9 


2 


3 


.850 


.830 


.832 


20 


900 


96.3 


98.1 


2 


3 


.835 


.833 


.833 


21 


845 


97.6 


98.2 


2 


3 


.823 


.809 


.798 


22 


900 


96.1 


95.9 


3 


3 


.827 


.813 


.810 


23 


900 


95.8 


97.3 


3 


3 


. 801 


. 834 


.808 


2A 


900 


95.8 


98.6 


4 


1 


.790 


.844 


.804 




495 


97,0 


95.8 


4 


4 


.803 


.806 


.815 


26 « • i « t . ^ « 4 


495 


92.5 


98.2 


5 


2 


.784 


.784 


.778 


27 


900 


98.7 


94.6 


3 


4 


.793 


.812 


.815 


28 


370 


98.9 


97. 6 


1 


1 


.814 


.833 


.797 


29 


370 


93.2 


96.8 


4 


2 


.780 


.800 


.806 


30 


370 


92.7 


95.1 


4 


2 


.809 


.789 


.813 



Inastnuch lag nil the tests involved in this study have been ficored by the 
fomula. Score ^ , Dresaal's adaptation of the Kuder-Rtchardaon fontiuln (20), 

which renders it appropfi.ate for use with such scoring * has been used (2). In nu 
instance did the two sectionB, A and appear consecutively in the teat. Seven of 
the verbal pairs were separated by additional verbal material; the other seven, by 
both verbal ana athamatical sections * On the other handj the two mathematical 
sections were all separated by additional mathematical material* 

With these considerations in mind it is Interesting to note that the 

alternate-f orm reliability ^ r , Is not always the lowest estimate^ as might rea- 

sonably be eKpecteds but, rather ^ it lies between the two K-R estimates In thirteen 

tests s equals one of them in three, and is higher than either in five more* For the 

verbal sections the mean value of r.^ is «827 and the mean KR (20) value is * 839 , 

AB 

whereas mean^ for the mathematical s-ections are .808 for r,^ and .812 for KR (20), 

AB 

As for the effect of speedednesSs there is no evidence to support the 
contention that the K^R estimate is inflated by the degree of epeededness encountered 
In these tests. Quite the other wayi for this particular set of thlrty^two 25-'item 
mathematical sections of Table 1, there Is a positive correlation of *44 between the 
KR (20) reliability estimate and the per cent of the group who reached three-quarters 
of the Items ^ and there la a negative correlation of .46 between the KR (20) estimate 
and the nmber of Items reached by less than 80 per cent of the group. Thus the 
speededness shown for these 32 tests tends to be accompanied by slightly lower Kuder-- 
Richardson aiaimates rather than higher values, 

KR (20) versus Odd-Evenj Same Administration 

The data of Tables 2, 3^ 4, and 5 are based on a single form of the SAT. 
Score A is a 40-item operational section and Scores C and D are 40=item equating 
sections that parallel A, Similarly, Scores E, and F are parallel 25=ltem sec- 
tiona. The four samples, of over 1,200 cases each| are mutually exclusive* The 
new data provided in these tables are intercorrelationi, means, and standard devl-- 
ations for scores on the odd-niOTbered and even-numbered items- In each table the 
"Total", rows contain In the last five columns the alternate^form reliability , the 
KR (20) estimates, the odd-even correlations stepped up by the Speaman-Brown 
prophecy formula, and, in the last column, the estimate that employs the odd and 
even variances and covarlance. This last formula, attributed by Kelley (3) to 
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Table 2 



Comparison of Kuder-Richardson Formula (20) Estimates 
with Eitlmatts Based on Scores on Odd and Even Items 

Sample 1. 40-Item Sectloni , 1,295 Cases 









Intercorrelations 


Reliability Estimate 


Score 


Mean 


S.D. 


A 
Odd 


A 
Even 


C 
Odd 


C 

Evan 


A 

Total 


C 

Total 


(a) 


(b) 


(c) 


A Odd 


6.39 


4.40 




.735 


.735 


.754 


.936 


.797 


.745 






A Even , . , . 


6.26 


4.08 


.735 




.704 


.717 


.923 


.758 


.696 






C Odd 


7.28 


4.38 


,.735 


.704 




.747 


.772 


.931 


.733 






C Even • , . , 


, 6.54 


4.53 


.754 


.717 


.747 




.790 


.934 


.758 






A Total ... 


12.55 


7.91 


.936 


.923 


.772 


.790 




,835 


.840 


.847 


.846 


C Total . . . 


13.68 


8.31 


.797 


.758 


.931 


.934 


.835 




.854 


.855 


.855 


Per cent who 


raached throe-quarters of 


i tens . 




99.3 


99.4 








Number of Items re a 


chad by less 


than 80 


per ce 


nt of 












group . , • , 






■ • S m m t w 








3 


1 









2r 



(a) KR (20) 



(b) 



OE 



1 + r, 



Cc) 



OE 



Table 3 



jarlaqn of Kuder-Richardson Formula (20) Estimates 
with EBtlmates Based on Scores on Odd and Even Items 

Sample 2. 40-Item Sections, 1,270 Cases 
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See f oo tnote to Tab le 2 for (ii ) , (b ) ^ (c ) ref erencsi • 









Intercorrelations 


Reliability Estimate 


Score 


Mean 


S.D. 




A 






















A 


D 


D 


A 


D 


(a) 


(b) 


(c) 








Odd 


Even 


Odd 


Even 


Total 


Total 


A Odd 


6.42 


4.53 




.737 


. 748 


.737 


.940 


.797 


.761 






A Even * * , * 


6,33 


4.01 


.737 




.697 


.695 


.920 


.747 


.684 






D Odd , , . , t 


7.61 


4.49 


.748 


.697 




.742 


.778 


.933 


. 754 






D Even . • * . . 


7.0^ 


4.42 


.737 


.695 


. 742 




.771 


.930 


.745 






A Total .\ * 


12,62 


7.92 


.940 


.920 


.778 


.771 • 




.831 


.841 


. 849 


.845 


D Total , , , 


14*53 


8.31 


.797 


.747 


.933 


.930 


.831 




.856 


.852 


.852' 


Per cent who 


reached thre 


i-quarters of 


items . 




99.0 


99.8 








Nmnber of itemi reached by less 


than 80 


per cent of 












group 


■ «*<■•• 






« • • ■ • * • 




■ • ■ ■ • 


3 


2 
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Compariion of Kuder-Richardson Fomula (20) Estimates 
with Egtlmates Based on Scores on Odd and Even Items 



Sample 3, 25-Item Sections, 1,295 Casei 









Intercorralations 


Reliability 1 


stimate 


Score 


Mean 


S.D. 


B 

Odd 


B 

Evan 


E 
Odd 


E 
Even 


B 

Total 


" E 
Total 


(a) 


Cb) 


(c) 


B Odd 


4.70 


3.08 




.693 


.708 


.706 


.924 


.770 


.700 






B Even , , . • . 


4,57 


2.82 


.693 




.695 


.676 


. 9:08 


.751 


.680 






1 Odd 


4,96 


3.17 


.708 


.695 




.685 


.763 


.925 


.701 






E Even 


4.71 


2.78 


,706 


.676 


.685 




.750 


.903 


.649 






B Total * . 


9*14 


5 . 38 


.924 


.908 


.763 


.750 




.827 


.814 


.819 


.817 


E Total ... * 


9.53 


5.45 


.770 


.751 


.925 


.903 


.827 




.807 


.813 


.809 


Per cent who 


reached three-quarters of 


1 tarns . 




89.3 


95.0 








Number of itema rea 


ched by less 


than 80 


per cent of 












group= , , . * * 


•••■«• 








• •••••• 




5 


3 









See footnote to Table 2 for (a) ^ (b), (c) references. 



Table S 

Comparison of Kuder-Rlchardion Formula (20) Estimates 
with Estimates Based on Scores on Odd and Even Itemi 



Sample 4, 25=ltam Sections, 1^275 Cases 









Intercor relations 


Pliability Estlfflate 


Score 


Mean 


S.D. 


B 
Odd 


Even 


F . 
Odd 


F 

Even 


B 

Total 


F 

Total 


(a) 


(b) 


Cc) 


B Odd 


4,87 


3.21 




.735 


. 705 


.704 


.936 


.768 


. 731 






B Even * , . , , 


4.64 


2.92 


.735 




.706 


.669 


.919 


.■750 


.715 






F Odd , . 


5.05 


3.10 


.70S 


.706 




.686 


.757 


.917 


. 706 






F Even \\ , . , 


5*07 


3.03 


.704 


.669 


.686 




.739 


.912 


.695 






B Total .... 


9.35 


5.67 


.936 


.919 


.757 


.739 




.815 


.838 


.847 


.845 


F Total 


9,98 


5.59 


.768 


.750 


.917 


.912 


.815 




.819. 


.814 


.813 


Per cent who 


reached threes-quarters of 


items . 


• • * ■ ■ 


86.8. 


94.7 








.Number of items reached by less 


than 80 


per cent of 












group . , . 














"5 ■ ■ 


3 









See footnote to Tabla 2 for (a), (b), (c). references. 
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Flanagan, is equivalent to KR (20) with the Item replaced by the half test as 
the basic unit* As Kelley notes, "The difference between this formula and [the 
Spearman-Brown formula] is trifling In six of the eight possible comparisons 

of KR (20) with odd^even methods the latter provide slightly higher estimates 
than the Kuder-- Richards on fortnula. Comparisons of the KR (20) estimates for the 
odd and even scores with the Intercorrelations among those scores add nothing new 
to the findings but serve simply to support the evidence already provided in the 
data of Table 1, 

KR (20) versus Altarnate-Foroj D i fferent AAnlnistratlons 

Finally, data have been assembled from ETS files to compare the Kuder^ 
Richardson reliability estimate that is provided for each new form of the SAT with 
alternate-form correlations ohtalned for candidates who, for various reasons, take 
a second form of the test after a period of sIk to ten months. Most of these 
candidates are juniors at the time of their first testing and seniors at the second. 
One example of this kind is given in Table 6, and another, for a different year, in 
Table 7* Together^ these tables involve nen different foms of the test. Each 
verbal score and each mathematical score is based on two separately timed parts. 
The KR (20) raliability and its associated standard error of measurement are 
computed for each part and then combined by the expression^ 1 = (error variance)/ 
(total variance), where the error variance is the sum of the two variance errors of 
measurement and the total variance is the variance of the total score, to get the 
total^test reliability* The results of this expression are the values in the first 
five rows of each table. The verbal-score reliabilities for the ten forms range from 
.914 to .928^ and the tiiathematlcal-score reliabilities, from ,885 to •918*, 

The niMerous factors that can affect correlations between measures separated 
by a substantial time interval are generally known and will not be covered here* To 
any ona who desires a discussion of this subject the treatment by R* L. Thorndike in 
Lindquist (5) is recoimnanded. Since the repeater groups of Tables 6 and 7 tend to be 
slightly less variable than, the analysis samples 3 correlations based on the repeater 
groups may be-expected to be correspondingly lower. In view of all the reasons one 
can find for expecting a drop In the alternate*- form correlations j the data of Table 6 
and 7 are particularly noteworthy < i'he mean verbal KR (20) reliability obtained 
for the ten analysis samples represented In the two tables is ,92. Even the lowest 
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Tabla 6 

Comparlion of Kuder-Richardson Fomula (20) Estimates 
with Alternats-Fom Correlations j Different Administratloni 



First Example 



AdmlnistrEtlon 


Nuniber 
Of 
Cases 


Verbal 


Mathematical 




r 

VV 




Mean 


S.D. 


Mean 


S.D. 


VM 


Vm 


Malyals samplei ; 


















March 


900 


461 
453 
458 
452 
434 


109 
112 
106 
111 
114 


487 
484 
495 
487 
466 


113 
108 
108 
115 
114 


.698 

.685 
7 e 

.682 
.687 


.921 
.921 
.916 
.915 
.928 


.905 
.885 
, 902 
.905 
.918 


May 

November 

Dacambar . , , 

January 


955 
2 000 

900 




Re.peatar groups s 


















November 


45,843 


481 
491 


103 

105 


511 
528 


105 
106 


.65-. 68 


.90 


. 88 


December 


21,669 


479 
499 


104 
110 


509 
532 


107 
108 


.65-. 65 


.90 


.88 


March 

January . , ^ » . * . . . 


3,262 


458 
474 


108 
112 


490 
504 


111 
113 


.65-. 68 


.91 


.89 


May B..t*^....«t. 

November , , 


125,975 


462 
468 


102 
100 


490 
507 


100 
101 


.63-. 65 


.89 


.86 


May 

December 


66,421 


459 
477 


104 

105 


490 
514 


104 
106 


.62-. 65 


.89 


.87 


May 


11,531 


448 
460 


110 
109 


480 
496 


111 
111 


.67-. 68 


.90 


.88 



Table 7 

Comparison of Kuder-Richardson Fomula (20) Estimates 
with Alternate=Fom Correlations, Different Administrations 



Second Example 





Number 


Ver 


bal 


Mathematical 








Administration 


of 
Cases 














r 

MM 


Mean 


S.D. 


Mean 


S.D. 


r 

V i i 


f 

V V 


Analysis samples i 


















March 


865 


446 


ill 


476 ' 


113 


.708 


.928 


.911 


April , . . , . t . t , . . 


900 


455 


108 


481 


111 


.692 


,914 


.907 








1 OR 


486 


113 




. ^ w 


* 3^ / 


December 


2,000 


448 


108 


481 


110 


.659 


.915 


.903 


January 


935 


431 


111 


475 


113 


.681 


.925 


.916 


Repeater groups r 


















March 


39 977 


470 


102 


503 


■ 104 


.66-. 57 


,90 


.88 


November 


481 


106 


512 


109 


March 
December 


18,669 


472 
493 


104 
108 


504 

524 


108 
109 


.66-. 68 


.90 


.88 


March 

January * . . * . . « . . 


2,707 


445 

456 


107 

108 


481 
498 : 


111 
109 


.69-. 72 


.91 


.89 


April ii»i>>^feBc* 

November 


117,975 


463 
469 


101 ' 
103 


493 
501 


104 
107 


.65-. 65 


.89 


.'88 


April 

December 


56,405 


457 
471 


102 

104 ' 


488 
506 


107 

106 


.64-. 66 


.89 


.83 


April 
January 


9,906 


443 
447 


108 
107 


kit 
491 


111 
107 


.67-. 71 


.90 


.89 
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varbal alternate^form coefficient Is as high as ,89, and the mean of the twelve such 
coefficients is *90* Similarly, the mean mathematical KR (20) estimate Is obcuic 
.91, whereas the mean alternate-form cQefflcient is .88. These findings are perhaps 
the most compelling of all to justify confidence in the use of the Kudtr-Rlchardson 
procedure with tests whose speed characteristics do not greatly differ from those 
of the SAT, 
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